machine learning model evaluation
Machine Learning Models Evaluation and Feature Importance Analysis on NPL Dataset
Fekadu, Rufael, Getachew, Anteneh, Tadele, Yishak, Ali, Nuredin, Goytom, Israel
Predicting the probability of non-performing loans for individuals has a vital and beneficial role for banks to decrease credit risk and make the right decisions before giving the loan. The trend to make these decisions are based on credit study and in accordance with generally accepted standards, loan payment history, and demographic data of the clients. In this work, we evaluate how different Machine learning models such as Random Forest, Decision tree, KNN, SVM, and XGBoost perform on the dataset provided by a private bank in Ethiopia. Further, motivated by this evaluation we explore different feature selection methods to state the important features for the bank. Our findings show that XGBoost achieves the highest F1 score on the KMeans SMOTE over-sampled data. We also found that the most important features are the age of the applicant, years of employment, and total income of the applicant rather than collateral-related features in evaluating credit risk. Work done when the authors were a research intern at Chapa. Equally contributed to this work.
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.06)
- Asia > Middle East > Israel (0.04)
- Asia > Bangladesh (0.04)
Introduction to Machine Learning Model Evaluation - Pirate Press
If we have been to checklist the applied sciences which have revolutionized and altered our lives for the higher, then Machine Studying will occupy the highest spot. This cutting-edge expertise is utilized in all kinds of functions in day-to-day life. ML has turn into an integral element in many of the industries like Healthcare, Software program, Manufacturing, Enterprise and goals to unravel many advanced issues whereas decreasing human effort and dependency. This it does by precisely predicting options for issues and varied functions. Usually there are two essential phases in machine studying.
Machine Learning Model Evaluation
So, the Solution is we need to split our data into training and testing sets. Training data (in-sample data) will be used to train our model and test data (out-of-sample) will be used for testing our model performance, this data will evaluate how our model performs on new sets or in real world. When we split a data, usually 70 percent of the data for training and 30 percent for testing. We will use sklearn for splitting our data, consider below code. And to help and validate our model python has a function which can be imported from sklearn library.